-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Improve containsLower performance using quick rejection #15076
Conversation
Sounds like test failure so will need to check |
j := 0 | ||
for len(line) > 0 { | ||
// ascii fast case | ||
if c := line[0]; c < utf8.RuneSelf && substr[j] < utf8.RuneSelf { | ||
if c == substr[j] || c+'a'-'A' == substr[j] || c == substr[j]+'a'-'A' { | ||
j++ | ||
if j == len(substr) { | ||
return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting I've always thought Loki was using strings.Index
but I guess one cannot ignore character casing there.
Anyhow, during my work on a SIMD substring search I've found that the underlying Rabin-Karp algorithm is soo much faster. I'm curious if it could be adapted to be case-insensitive 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't for case sensitive yes. What do you think of the PR though that would help me to get a review.
// Fast ASCII comparison | ||
if c < utf8.RuneSelf && s < utf8.RuneSelf { | ||
if c != s && c+'a'-'A' != s && c != s+'a'-'A' { | ||
matched = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to follow this. Don't you have to start over from linePos
here instead of breaking out of the loops?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean does this pass baz
in foobarbbbaBAZfoobar
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok, we jump to line 504 and then increase i
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one comment, apart from that lgtm
line = line[lwid:] | ||
continue | ||
|
||
if unicode.ToLower(lr) != mr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we also check in the other direction like we did for ASCII?
if unicode.ToLower(lr) != mr { | |
if unicode.ToLower(lr) != mr && unicode.ToLower(mr) != lr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the substr is always lowered already.
What;s annoying though here is that we decode runes for the substr on every pass, this could be improved ! But that's for another PR. ^^
This pull request includes improvements to the
containsLower
function inpkg/logql/log/filter.go
for better performance and adds benchmark tests to ensure the changes are effective. The key changes include optimizing the case-insensitive substring search and introducing comprehensive benchmark tests.Optimizations to
containsLower
function:pkg/logql/log/filter.go
: Optimized thecontainsLower
function by implementing a more efficient search algorithm that first locates the potential starting byte of the substring and then verifies the rest of the substring with both fast ASCII and slower Unicode comparisons.Addition of benchmark tests:
pkg/logql/log/filter_test.go
: Added a new benchmark functionBenchmarkContainsLower
with various test cases to evaluate the performance of thecontainsLower
function under different scenarios, including short and long lines, matches and non-matches, and lines with Unicode characters.Result:
*What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Checklist
CONTRIBUTING.md
guide (required)feat
PRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.docs/sources/setup/upgrade/_index.md
deprecated-config.yaml
anddeleted-config.yaml
files respectively in thetools/deprecated-config-checker
directory. Example PR